Organizations collect more data and retain data for
longer than ever before. The phenomenal growth of the storage
manufacturing industry over the past 10 years is a testament to
continually increasing data collection. Given adequate capacity,
storing large quantities of data within SQL Server is no big problem,
until we need to retrieve some data or perform any maintenance. New
challenges arise from retrieving single rows or range searches of
multiterabyte databases, while maintaining good response times.
Partitioning
was first available in SQL Server 7.0, although in different versions
this application logic was required to determine the partition holding
a specific row. In SQL Server 2000 it was possible to define a view
that unified the data and in SQL Server 2005 table partitions were
completely transparent to applications. SQL Server 2008 Enterprise
edition provides the next generation of table and index partitioning,
which introduces a round-robin thread model to satisfy queries
accessing multiple partitions. Additionally, SQL Server 2008 includes
new level of lock escalation, which means locks can escalate from row
or page locks to partition locks. This differs from SQL Server 2005,
where row or page locks could be escalated directly to table locks.
Horizontal Partitioning
Horizontal
Partitioning involves dividing a large table into a number of smaller
tables, each containing all columns for a subset of rows. Dividing rows
into separate tables means each table is much smaller and access times
are typically more efficient, and
therefore faster. Partitioning maintains data integrity, and it is
possible to partition based on any column; however date ranges are most
common for partitioning. This allows administrators to separate recent
(usually more active) data from older archive data, improving the
performance of data access to the frequently accessed data (usually
recent rows).
Vertical Partitioning
This
method splits a large table into a number of smaller tables; each table
contains every row, but a subset of all columns. The database
normalization process provides vertical partitioning by removing any
attributes not dependent on the primary key, joining these with a
primary key/ foreign key constraint. Consider this method with caution,
since retrieving all columns for a given row will require a join
between the tables, which could be expensive in terms of performance.
Filegroups
Successful
implementations of table partitioning improve availability and
performance of the entire table since many operations can be performed
in parallel. Availability can be improved because backup and restore
operations can be performed on an individual filegroup. Additionally
performance can be improved because CHECKDB and regular scan and seek
operations can be performed in parallel when partitions are implemented
on their own filegroups.
It’s
common for SQL Server databases to grow to hundreds of gigabytes, and
multiterabyte databases are no longer unusual. It’s likely that the
fastest disk storage and backup target full backups will take many
hours to complete. Filegroups help by allowing partial backups, or by
allowing backups to run in parallel. Every database has a primary
filegroup, and if you’re going to create multiple data files it’s
recommended to create these on a secondary filegroup.
Partitions
can operate within a single filegroup, however it is recommended that
to gain the full benefit of table partitioning, each partition reside
on its own file-group. This approach provides the benefit that each
filegroup (therefore partition) can be stored on a different disk,
meaning there are lots of benefits in I/O throughput and therefore
database performance. The following example creates a new database with
a primary filegroup (required) and four further filegroups, each with
one data file:
CREATE DATABASE Orders
Create FILEGROUP OrdersGroup1
(NAME = OrdersGrp1Fi1_dat,
FILENAME = 'F:\MDF\OrdersG1Fi1_dat.ndf',
SIZE = 5120MB,
FILEGROWTH = 1024MB),
FILEGROUP OrdersGroup2
(NAME = OrdersGrp2Fi1_dat,
FILENAME = 'G:\MDF\OrdersG2Fi1_dat.ndf',
SIZE = 5120MB,
FILEGROWTH = 1024MB),
FILEGROUP OrdersGroup3
(NAME = OrdersGrp3Fi1_dat,
FILENAME = 'H:\MDF\OrdersG3Fi1_dat.ndf',
SIZE = 5120MB,
FILEGROWTH = 1024MB),
FILEGROUP OrdersGroup4
(NAME = OrdersGrp4Fi1_dat,
FILENAME = 'I:\MDF\OrdersG4Fi1_dat.ndf',
SIZE = 5120MB,
FILEGROWTH = 1024MB)
LOG ON
(NAME = Orders_log,
FILENAME = 'E:\LDF\Orders_log.ldf',
SIZE = 5MB,
FILEGROWTH = 1MB)
GO
The
concepts and goals of partitioning are fairly easy to grasp; the
hardest part is understanding the language around the implementation of
table partitioning. Essentially, there are two concepts introduced by a
comprehensive wizard and accompanied by corresponding T-SQL commands.
Selecting a Partition Key and Number of Partitions
Any
column can be used as a partition key and this column is the logical
division between partitions. The partition function implements the
partition key, but not the data placement on disk. It’s important to
know the data and data access patterns since this knowledge will help
select the partition and number of partitions. If there are logical
boundaries or grouping to data, use these as the partition key; for
example,if orders are typically queried by calendar months or fiscal quarters, these could be a natural choice as a partition key.
Partition Function
The
partition function is used to map rows to partitions based on the
partition key. The partition function specifies the boundary between
each of the partitions. LEFT or RIGHT is used to determine in which
partition the boundary value resides; LEFT is default. The following
example creates a partition function that partitions orders based on
order date:
CREATE PARTITION FUNCTION [pf_Orders](datetime) AS RANGE LEFT FOR
VALUES(N'2002-01-01T00:00:00',
N'2003-01-01T00:00:00',
N'2004-01-01T00:00:00')
This
statement will create four partitions (despite only three values being
listed). The boundary values are shown as “equal to or less than” since
the partition is created with RANGE LEFT. The data is split between the
partitions as follows:
Partition Number | Partition 1 | Partition 2 | Partition 3 | Partition 4 |
---|
RANGE LEFT | <= 2002-01-01 | > 2002-01-01 AND <= 2003-01-01 | 2003-01-01 AND <= 2004-01-01 | > 2004-01-01 |
Partition Scheme
The
placement of data is determined by the partition function. The
partition scheme controls mapping between partitions and filegroups.
Performance gain can often be realized by using a 1-1 mapping between
partitions and filegroups, and further by placing each filegroup on its
own logical disk.
The following example creates a partition scheme, mapping each partition to its own filegroup:
CREATE PARTITION SCHEME [ps_Orders] AS PARTITION [pf_Orders]
TO ([fgSalesOrders1], [fgSalesOrders2], [fgSalesOrders3], [fgSalesOrders4])
The partition scheme definition will map the partitions to filegroups as shown in Table 1.
Table 1. Sample Partition Scheme
Partition Number | Partition 1 | Partition 2 | Partition 3 | Partition 4 |
---|
RANGE LEFT | <= 2002-01-01 | > 2002-01-01 AND <= 2003-01-01 | 2003-01-01 AND <= 2004-01-01 | > 2004-01-01 |
Filegroup | fgSalesOrders1 | fgSalesOrders2 | fgSalesOrders3 | fgSalesOrders4 |
Moving Data between Partitions
Partitioning
provides performance and manageability benefits and migrating to a
partitioned table is relatively pain free. There are three options for
moving data between partitions:
Split Partition
Merge partition
Switch Partition
The
Partition splitting is implemented with the partition function and
alters the boundary between partitions to divide an existing partition.
A split commonly is used when adding a new partition at the end of an
existing range. Merge partition again is administered with the
partition function and can be used to combine two partitions. Both
merge and split use the ALTER PARTITION FUNCTION TSQL command:
ALTER PARTITION FUNCTION pQuantity()
SPLIT RANGE(500)
Finally,
probably the most useful function is SWITCH partition, which is useful
when moving a complete partition. Be aware that ALTER TABLE... SWITCH
is considered schema modification, therefore requires schema
modification (Sch-M) lock on the table. Using the SWITCH functionality,
it’s possible to implement a sliding window, whereby partitioning
automatically manages the partitions. Here’s an example of the sliding
window:
Partition 1 – Current week
Partition 2 – Previous 2 weeks
Partition 3 – Previous 3 months
Partition 4 – Previous 3 to 6 months
Partition 5 - Everything older than 6 months
In
this scenario the partitioning functionality will provide automatic
management for partitions and optimal performance for recent data.
Older data is still available, however data retrieval times will likely
be longer since there are large indexes.